Lessons from building Manus - an AI agent framework that leverages in-context learning of frontier models
"Context engineering is still an emerging science—but for agent systems, it's already essential."
At the beginning of the Manus project, we faced a key decision:
Our choice: Bet on context engineering
Why? Ship improvements in hours instead of weeks, and keep our product orthogonal to underlying models
We call our manual process of architecture searching, prompt fiddling, and empirical guesswork "Stochastic Graduate Descent"
The single most important metric for a production-stage AI agent: KV-cache hit rate
Why it matters:
Cached input tokens cost 0.30 USD/MTok vs 3 USD/MTok for uncached - a 10x difference!
Common mistake: Including a timestamp at the beginning of the system prompt kills your cache hit rate
If self-hosting models using frameworks like vLLM, ensure prefix/prompt caching is enabled
As agents gain more capabilities, their action space grows more complex
Dynamic action spaces (like RAG-based tool loading) cause problems:
Manus uses a context-aware state machine to manage tool availability by masking token logits during decoding
We constrain action selection by masking token logits directly:
Model may choose to call a function or not
Prefixed with: <|im_start|>assistant
Model must call a function, unconstrained
Prefixed with: <|im_start|>assistant<tool_call>
Model must call a function from a specific subset
Prefixed with: <|im_start|>assistant<tool_call>{"name": "browser_
Consistent tool prefixes (browser_, shell_) allow easy enforcement of tool groups
Modern LLMs offer 128K+ token contexts, but in agentic scenarios:
Manus treats the file system as ultimate context: unlimited, persistent, and directly operable by the agent
Compression strategies are always designed to be restorable
Manus creates and updates todo.md files during complex tasks
Why this matters:
By rewriting the todo list, Manus recites objectives into context end, pushing global plan into recent attention span
This avoids "lost-in-the-middle" issues and reduces goal misalignment
Agents make mistakes - that's reality, not a bug
Common impulse: Hide errors, clean up traces, retry actions
Problem: Erasing failure removes evidence
Most effective improvement: Leave wrong turns in context
When model sees failed action + observation, it implicitly updates internal beliefs
Error recovery is one of the clearest indicators of true agentic behavior
Few-shot prompting can backfire in agent systems
Language models are excellent mimics - they imitate patterns in context
Danger in tasks with repetitive decisions:
Fix: Increase diversity with structured variation in actions and observations
The more uniform your context, the more brittle your agent becomes
Context engineering is still an emerging science—but for agent systems, it's already essential
How you shape the context ultimately defines how your agent behaves:
"The agentic future will be built one context at a time. Engineer them well."
These patterns worked for us after repeated rewrites, dead ends, and real-world testing across millions of users